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The luting of claims will replace all prior versions, and listings, of claims in the application: 

Listing of Claims* 

I. (withdrawn) A method for identifying amino acid residues for variation in a 
protein variant library in order to affect a desired activity, said method comprising: 

(a) receiving data characterizing a training set of a protein variant library, 
wherein protein variants in the library have systematically varied sequences, and 
wherein the data provides activity and sequence information for each protein variant in 

the training set; 

(b) from the data, developing a sequence activity model that predicts activity as a 
function of amino acid residue type and corresponding position in the sequence; and 

(c) using the sequence activity model to identify one or more amino add residues at 
specific positions in the systematically varied sequences that are to be varied in order to impact 
the desired activity, 

2. (withdrawn) The method of claim 1, fiirther comprising: 

(d) using the sequence activity model to identify one or more amino acid residues that 
are to remain fixed in a new protein variant library, 

3. (withdrawn) The method of claim 1, wherein the protein variant library 
comprises naturally occurring proteins or proteins derived therefrom. 

4. (withdrawn) The method of claim 3, wherein the naturally occurring proteins 
comprise proteins that are encoded by members of a single gene family. 

5. (withdrawn) The method of claim 1, wherein die protein variant library 
comprises proteins that are obtained by using a recombination-based diversity generation 
mechanism. 

6. (withdrawn) The method of claim 1, fiirther comprising performing DOE to 
identify the systematically varied sequences. 

7. (withdrawn) The method of claim 1 , wherein the activity is not protein stability. 

8. (withdrawn) The method of claim 1, wherein the sequence activity model is a 
regression model. 
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9. (withdrawn) The method of claim I, wherein the sequence activity model is a 
partial least squares model. 

10. (withdrawn) The method of claim 1, wherein the sequence activity model is a 
neural network. 



11, (withdrawn) The method of claim 1, wherein using the sequence activity model 
to identify one or more amino acid residues ftaher comprises identifying sequences for use in a 
recombination-based diversity generation mechanism, wherein said sequences comprise 
variations in the one or more amino acid residues identified in (c). 

12. (withdrawn) Tlxe method of claim I, wherein using the sequence activity model 
comprises identifying a sequence predicted by the model to have a highest value of the desired 
activity. 



13. (withdrawn) The method of 12, wherein using the model farther comprises 
selecting subsequences ofthe best sequence. 

14. (withdrawn) The method of claim 1, wherein using the sequence activity model 
to identify one or more amino acid residues comprises using the sequence activity model to rank 
residue positions in order of impact on the desired activity. 

15. (withdrawn) The method of claim 1, wherein using the sequence activity model 
to identify one or more amino acid residues comprises using the sequence activity model to rank 
residue types at residue positions in order of impact on the desired activity. 

16* (withdrawn) The method of claim 1, wherein using the model comprises using 
the model as a fitness function in a genetic algorithm. 

1 7. (withdrawn) The method of claim 1 , wherein using the sequence activity model 
to identify one or more amino acid residues at specific positions in the systematically varied 
sequences comprises identifying one or more sequences for use in generating a new protein 
variant library. 

18. (withdrawn) The method of claim 17, wherein the sequences are 
oligonucleotide sequences encoding variations ofthe one or more identified amino acid residues. 
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19. (withdrawn) The method of claim 18, further comprising performing 
mutagenesis or a recombination-based diversity generation mechanism using the oligonucleotide 
sequences to generate the new protein variant library. 

20. (withdrawn) The method of claim 19, wherein performing mutagenesis or a 
recombination-based diversity generation mechanism is used in a directed evolution procedure. 

21. (withdrawn) The method of claim 18, wherein the oligonucleotide sequences 
encode at least a portion of (i) a naturally occurring parent protein having the highest activity 
among naturally occurring parent proteins, or (ii) a sequence predicted by the sequence activity 
model to have the highest activity. 

22. (withdrawn) The method of claim 17, further comprising developing a new 
sequence activity model using activity and sequence data characterizing the new protein variant 
library, 

23. (withdrawn) Hie method of claim 17, further comprising selecting one or more 
members of the new protein variant library for production. 

24. (withdrawn) The method of claim 23, further comprising expressing one or 
more of the selected members of the new protein variant library. 

25. (withdrawn) The method of claim 23, further comprising: 

(i) providing an expression system from which a selected member of the new protein 
variant library can be expressed; and 

(ii) expressing the selected member of the new protein variant library. 

26. (withdrawn) The method of claim 1, wherein the one or more amino acid 
residues identified in (c> are identified in a reference sequence predicted using the sequence 
activity model or a reference sequence that describes a member of die protein variant library. 

27. (withdrawn) A method for identifying amino acid residues for variation in a 
protein variant library in order to affect a desired activity, said method comprising: 

(a) receiving data characterizing a training set of a protein variant library comprising 
proteins that were obtained by performing classical or synthetic DNA shuffling on 
nucleic acids encoding all or part of one or more naturally occurring parent proteins, 
wherein the data provides activity and sequence information for each protein variant in 
the training set; 

6 
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(b) from the data, developing a sequence activity model that predicts activity as a 
function of amino acid residue type and corresponding position in the sequence; and 

(c) using the sequence activity model to identify one or more amino acid residues, in 
proteins of the library, that are to be varied in order to impact the desired activity. 

28. (withdrawn) A method for identifying amino acid residues for variation in a 
protein variant library in order to affect a desired activity, said method comprising: 

(a) receiving data characterizing a training set of a protein variant library, wherein the 
data provides activity and sequence information for each protein variant in the training 
set; 

(b) from the data, developing a sequence activity model that predicts activity as a 
function of amino acid residue type and corresponding position in the sequence; and 

(c) using the sequence activity model to identify one or more amino acid residues, in 
proteins of the protein variant library, that are to be varied in order to identify one or more 
sequences for use in a directed evolution procedure. 

29. (withdrawn) The method of claim 28, wherein the sequences are 
oligonucleotide sequences encoding variations of the one or more identified amino add residues. 

30. (withdrawn) A method for identifying amino acid residues for variation in a 
protein variant library in order to affect a desired activity, said method comprising; 

(a) receiving data characterizing a training set of a protein variant library, wherein the 
data provides activity and sequence information for each protein variant in the training 
set; 

(b) from the data, developing a sequence activity model that predicts activity as a 
Amotion of amino acid residue type and corresponding position in the sequence; 

(c) using the sequence activity model to rank residue positions or residue types at 
specific residue positions in order of impact on the desired activity; 

(d) using the ranking to identify one or more amino acid residues, in proteins of the 
protein variant library, that are to be varied or fixed in order to impact the desired activity. 

31. (withdrawn) A method for generating an optimized protein variant library, said 
method comprising: 

(a) receiving data characterizing a training set of a protein variant library, 
wherein protein variants in the library have systematically varied sequences, and 
wherein the data provides activity and sequence information for each protein variant in 
the training set; 
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(b) from the data, developing a sequence activity model that predicts activity as a 
function of amin o acid residue type and corresponding position in the sequence; 

(c) using the sequence activity model to select one or more amino acid residues at 
specific positions in the systematically varied sequences that are predicted to provide desired 
activity; 

(d) generating an optimized protein variant library, 

wherein the sequences of the members of the optimized protein variant library each 
comprise the one or more selected amino acid residues. 

32. (withdrawn) A computer program product comprising a computer readable 
medium on which is provided program instructions for identifying amino acid residues for 
variation in a protein variant library in order to affect a desired activity, said instructions 
comprising: 

(a) code for receiving data characterizing a training set of a protein variant library, 
wherein protein variants in the library have systematically varied sequences, and 
wherein the data provides activity and sequence information for each protein variant in 

the training set; 

(b) code for using the data to develop a sequence activity model that predicts activity 
as a function of amino acid residue type and corresponding position in the sequence; and 

(c) code for using the sequence activity model to identify one or more amino acid 
residues at specific positions in the systematically varied sequences that are to be varied in order 
to impact the desired activity. 

33. (withdrawn) The computer program product of claim 32, wherein the program 
instructions further comprise: 

(d) code for using the sequence activity model to identify one or more amino acid 
residues that are to remain fixed in a new protein variant library. 

34. (withdrawn) The computer program product of claim 32, wherein the program 
instructions further comprise code for performing DOE to identify the systematically varied 
sequences. 

35. (withdrawn) The computer program product of claim 32, wherein the sequence 
activity model is a regression model. 

36. (withdrawn) The computer program product of claim 32, wherein the sequence 
activity model is a partial least squares model. 
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37. (withdrawn) The computer program product of claim 32, wherein the sequence 
activity model is a neural network. 

38. (withdrawn) The computer program product of claim 32, wherein the code for 
using the sequence activity model comprises code for identifying a sequence predicted by the 
model to have a highest value of the desired activity. 

39. (withdrawn) The computer program product of 38, wherein the code for using 
the model further comprises code for selecting subsequences of the best sequence. 

40. (withdrawn) The computer program product of claim 32, wherein the code for 
using the sequence activity model to identify one or more amino acid residues comprises code 
for using the sequence activity model to rank residue positions in order of impact on the desired 
activity. 

41. (withdrawn) The computer program product of claim 32, wherein the code for 
using the sequence activity model to identify one or more amino acid residues comprises code 
for using the sequence activity model to rank residue types at residue positions in order of impact 
on the desired activity. 

42. (withdrawn) The computer program product of claim 32, wherein the code for 
using the model comprises code for using the model as a fitness function in a genetic algorithm. 

43. (withdrawn) The computer program product of claim 32, wherein the code for 
using the sequence activity model to identify one or more amino acid residues at specific 
positions in the systematically varied sequences comprises code for identifying one or more 
sequences for use in generating a new protein variant library. 

44. (withdrawn) The computer program product of claim 43, wherein the sequences 
are oligonucleotide sequences encoding variations of the one or more identified amino acid 
residues. 

45. (withdrawn) The computer program product of claim 44, wherein the 
oligonucleotide sequences encode at least a portion of (i) a naturally occurring parent protein 
having the highest activity among naturally occurring parent proteins, or (ii) a sequence 
predicted by the sequence activity model to have the highest activity. 
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46. (withdrawn) The computer program product of claim 43, further comprising 
code for developing a new sequence activity model using activity and sequence data 
characterizing the new protein variant library, 

47. (withdrawn) The computer program of claim 43, further comprising code for 
selecting oae or more members of the new protein variant library for production. 

48. (withdrawn) The computer program product of claim 32, wherein the code in 
(c) identifies the one or more amino acid residues in (i) a reference sequence predicted using the 
sequence activity model or (ii> a reference sequence that describes a member of the protein 
variant library. 

49. (withdrawn) A computer program product comprising a computer readable 
medium on which is provided program instructions for identifying amino acid residues for 
variation in a protein variant library in order to affect a desired activity, said program instructions 
comprising: 

(a) code for receiving data characterizing a training set of a protein variant library 
comprising proteins that were obtained by performing classical or synthetic DNA 
shuffling on nucleic acids encoding all or part of one or more naturally occurring parent 
proteins, wherein the data provides activity and sequence information for each protein 
variant in the training set; 

(b) code for using the data to develop a sequence activity model that predicts activity 
as a function of amino acid residue type and corresponding position in the sequence; and 

(c) code for using the sequence activity model to identify one or more amino acid 
residues, in proteins of the library, that are to be varied in order to impact the desired activity. 

50. (withdrawn) A computer program product comprising a machine readable 
medium on which is provided program instructions for identifying amino acid residues for 
variation in a protein variant library in order to affect a desired activity, said program instructions 
comprising: 

(a) code for receiving data characterizing a training set of a protein variant library, 
wherein the data provides activity and sequence information for each protein variant in 
the training set; 

(b) code for using the data to develop a sequence activity model that predicts activity 
as a function of amino acid residue type and corresponding position in the sequence; and 

(c) code for using the sequence activity model to identify one or more amino acid 
residues, in proteins of the protein variant library, that are to be varied in order to identify one or 
more sequences for use in a directed evolution procedure. 
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51. (withdrawn) A computer program product comprising a machine readable 
medium on which is provided program instructions for identifying amino acid residues for 
vanation m a protein variant library in order to affect a desired activity, said program instructions 
comprising: 

(a) code for receiving data chatacterizing a training set of a protein variant library 
wherein the data provides activity and sequence information for each protein variant in" 
the training set; 

(b) code for using the data to develop a sequence activity model that predicts activity 
as a function of amino acid residue type and corresponding position in the sequence; 

(c) code for using the sequence activity model to rank residue positions or residue 
types at specific residue positions in order of impact on the desired activity; 

(d) code for using the ranking to identify one or more amino acid residues, in proteins 
of the protein variant library, that are to be varied or fixed in order to impact the desired activity. 

52. (withdrawn) A computer program product comprising a machine readable 
medium on which is provided program instructions for generating an optimized protein variant 
library, said program instructions comprising: 

(a) code for receiving data characterizing a training set of a protein variant library, 
wherein protein variants in the library have systematically varied sequences, and 
wherein the data provides activity and sequence information for each protein variant in 

the training set; 

(b) code for using the data to develop a sequence activity model that predicts activity 
as a function of amino acid residue type and corresponding position in the sequence; 

(c) code for using the sequence activity model to select one or more amino acid 
residues at specific positions in the systematically varied sequences that are predicted to provide 
desired activity; 

(d) code for defining an optimized protein variant library, 

wherein the sequences of the members of the optimized protein variant library each 
comprise the one or more selected amino acid residues. 

53. (withdrawn) A method of identifying members of a population of biopolymer 
sequence variants most suitable for artificial evolution, the method comprising: 

(a) selecting or screening the members of a population of biopolymer sequence variants 
for two or more desired objectives to produce a multi-objective fitness data set; 

(b) identifying a Pareto front in the multi-objective fitness data set; and, 

(c) selecting one or more members proximal to the Pareto front, thereby identifying the 
members of the population of biopolymer sequence variants most suitable for artificial evolution. 
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54. (withdrawn) The method of claim 53, wherein step (c) comprises: 

G) calculating a weighted sum of the two or more desired objectives for at least 
some of the members proximal to the Pareto fiont; and 

(ii) selecting at least one member comprising a higher weighted sum than other 
members proximal to the Pareto front 

55. (withdrawn) The method of claim 53, wherein step (c) comprises: 

(i) ranking the one or more members according to relative proximity to the Pareto 
fiont and relative isolation in sequence space; and, 

(ii) selecting at least one member that ranks higher than other members proximal 
to the Pareto fiont. 

56. (withdrawn) A computer program product comprising a computer readable 
medium having one or more logic instructions for 

(a) applying one or more multi-objective evolutionary algorithms to at least one parental 
biopolymer sequence to produce a set of biopolyrner sequence variants; 

(b) selecting or screening the members of the set of biopolymer sequence variants for two 
or more desired objectives; 

(c) plotting the set of biopolymer sequence variants as a function of the two or more 
desired objectives to produce a biopolymer sequence variant plot; and, 

(d) identifying a Pareto fiont in the biopolymer sequence variant plot to identify the 
members of the set of biopolymer sequence variants comprising multiple improved objectives 
relative to other members of the set of biopolymer sequence variants. 

57. (withdrawn) A method of predicting sequences that comprise desired properties, 
the method comprising: 

(a) evolving at least one parental sequence using at least one artificial evolution 
procedure to produce at least one population of artificially evolved sequences; 

(b) selecting or screening the population of artificially evolved sequences for at least one 
desired property to produce a population of selected artificially evolved sequences; 

(c) tr aining a neural network with the population of selected artificially evolved 
sequences to produce a trained neural network; and, 

(d) predicting one or more sequences that comprise the at least one desired property using the 
trained neural network. 

58. (withdrawn) A computer system for predicting sequences that comprise desired 
properties, comprising: 



12 

PAGE 14/46 f RCVD AT 9/1 5/2006 5:54:55 PM [Eastern Daylight Time] * SVR:USPTO-EFXRF-2/8 * DNIS:2738300 * CSID:5106630920 « DURATION (mm-ss): 1 M6 



SEP. 15. 2006 3:01PM 5106630920 NO. 956 P. 15" 



(a) at least one computer system comprising a neural network and a database capable of 
storing sequences; and, 

(b) system software comprising one or more logic instructions for: 

(i) evolving at least one parental sequence using at least one artificial evolution 
procedure to produce at least one population of artificially evolved sequences; 

(ii) selecting or screening the population of artificially evolved sequences for at 
least one desired property to produce a population of selected artificially evolved 
sequences; 

(in) training the neural network with the population of selected artificially 
evolved sequences to produce a trained neural network; and 

(iv) predicting one or more sequences that comprise the at least one desired 
property using the trained neural network. 

59. (withdrawn) A computer program product for predicting sequences that 
comprise desired properties, comprising a computer readable medium having one or more logic 
instructions for; 

(a) evolving at least one parental sequence using at least one artificial evolution 
procedure to produce at least one population of artificially evolved sequences; 

(b) selecting or screening the population of artificially evolved sequences for at least one 
desired property to produce a population of selected artificially evolved sequences; 

(c) training a neural network with the population of selected artificially evolved 
sequences to produce a trained neural network; and, 

(d) predicting one or more sequences that comprise the at least one desired property using 
me trained neural network. 

60. (withdrawn) A method of predicting at least one property of at least one target 
polypeptide sequence, the method comprising; 

(a) identifying one or more motife common to two or more members of a population of 
polypeptide sequence variants, wherein at least a subset of the population of polypeptide 
sequence variants comprises the at least one property, to produce a motif data set; 

(b) correlating at least one motif from the motif data set with the at least one property to 
produce a motif scoring function; and, 

(c) scoring the at least one target polypeptide sequence using the motif scoring function, 
thereby predicting the at least one property of the at least one target polypeptide sequence. 

61. (withdrawn) A system for predicting at least one property of at least one target 
polypeptide sequence, comprising: 

(a) at least one computer comprising a database capable of storing sequences; and, 



13 

PAGE 1SM6 * RCVD AT 9/1S/2008 5:54:55 PM (Eastern Daylight Time] * SVR:USPTO-EFXRF-2/8 « DNIS:2738300 * CSID:5106630920 * DURATION 0nm-ss):1M6 



SEP. 15. 2006 3:01 PM 51 06630920 



NO. 956 P. 16 



(b) system software comprising one or more logic instructions for: 

(i) identifying one or more motife common to two or more members of a 
population of polypeptide sequence variants, wherein at least a subset of the population 
of polypeptide sequence variants comprises the at least one property, to produce a motif 
data set; 

(ii) correlating at least one motif from the motif data set with the at least one 
property to produce a motif scoring function; and 

(iii) scoring the at least one target polypeptide sequence using the motif scoring 
function to predict the at least one property of the at. least one target polypeptide 
sequence. 

62. (withdrawn) A computer progra m product for predicting at least one property of 
at least one target polypeptide sequence, comprising a computer readable medium having one or 
more logic instructions for: 

(a) identifying one or more motife common to two or more members of a population of 
polypeptide sequence variants, wherein at least a subset of the population of polypeptide 
sequence variants comprises the at least one property, to produce a motif data set; 

(b) correlating at least one motif from the motif data set with the at least one property to 
produce a motif scoring function; and, 

(c) scoring the at least one target polypeptide sequence using die motif scoring function 
to predict the at least one property of the at least one target polypeptide sequence. 

63. (withdrawn) A system for predicting sequence activities, comprising: 

(a) at least one computer comprising a database capable of storing sequences; and, 

(b) system software comprising one or more logic instructions for: 

(i) selecting a set of parental sequences for at least one activity to produce a set of 
selected parental sequences; 

(ii) subjecting the set of selected parental sequences to one or more artificial 
evolution procedures to produce a set of evolved sequences; 

(iii) selecting the set of evolved sequences for the at least one activity to produce 
a set of selected evolved sequences; 

(iv) providing a sequence-activity plot for the set of sequence variants; and 

(v) predicting at least one activity of one or more sequences from the sequence- 
activity plot. 

64. (withdrawn) A computer program product for predicting sequence activities, 
comprising a computer readable medium having one or more logic instructions for: 
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(a) selecting a set of parental sequences for at least one activity to produce a set of 
selected parental sequences; 

(b) subjecting the set of selected parental sequences to one or more artificial evolution 
procedures to produce a set of evolved sequences; 

(c) selecting the set of evolved sequences for the at least one activity to produce a set of 
selected evolved sequences; 

(d) providing a sequence^activity plot for the set of sequence variants; and, 

^ (e) predicting at least one activity of one or more sequences from the sequence-activity 

65. (withdrawn) A method of producing libraries of desired sizes, the method 
comprising: 

(a) identifying one or more homologues of at least one initial polypeptide sequence; 

(b) comparing the sequences of the homologue(s) and the initial polypeptide; 

(c) identifying variable amino acid residues, wherein variable amino acid residues differ 
with respect to residue type at corresponding positions in the sequences of the homologue(s)and 
the initial polypeptide sequence; 

(d) identifying a set of evolutionarily conserved variable amino acid residues; and 

(e) generating a library of protein variants incorporating the set of evolutionarily 
conserved variable amino acid residues. 

66. (withdrawn) The method of claim 65, wherein step (b) comprises using at least 
one substitution matrix to identify the set of evolutionarily conserved variable amino add 
residues. 

67. (withdrawn) The method of claim 65, wherein the library produced by the 
method comprises a high average fitness as compared to the fitness of the initial polypeptide 
sequence. 

68. (withdrawn) The method of claim 65, wherein the homologues comprise a 
phylogenetic family of polypeptides. 

69. (withdrawn) The method of claim 65, further comprising screening or selecting 
members of the library provided in step (e) for one or more desired properties. 

70. (withdrawn) The method of claim 65, fbrther comprising repeating steps (a)-(e) 
using at least one screened or selected member as the at least one initial polypeptide in a 
repeated step (a). 
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71. (withdrawn) A system for producing libraries of desired sizes, comprising: 

(a) at least one computer comprising a database capable of storing sets of polypeptide 
sequences; and, 

(b) system software comprising one or more logic instructions for: 

(i) identifying one or more homologies of at least one initial polypeptide 
sequence from a selected evolutionary times cale; 

(ii) comparing the sequences of the homologue(s) and the initial polypeptide; 

(iii) identiiying variable amino acid residues, wherein variable amino acid 
residues differ with respect to residue type at corresponding positions in the sequences of 
the homoIogue(s) and the initial polypeptide sequence; and 

(iv) identifying a set of evdlutionarily conserved variable amino acid residues. 

72. (withdrawn) The system of claim 71 wherein the system software further 
comprises logic instructions for: 

(v) identifying a set of oligonucleotide sequences that collectively encode 
polypeptide variants of the initial polypeptide, wherein the set comprises oligonucleotides that 
encode the set of cvolutionarily conserved variable amino acid residues. 

73. (withdrawn) A computer program product for producing libraries of desired 
sizes, comprising a computer readable medium having one or more logic instructions for r 

(i) identifying one or more homologues of at least one initial polypeptide 
sequence/sequence from a selected evolutionary timescale; 

(ii) comparing the sequences of the homologuc(s) and the initial polypeptide; 

(iii) identifying variable amino acid residues, wherein variable amino acid residues differ 
with respect to residue type at corresponding positions in the sequences of the homologue(s) and 
the initial polypeptide sequence; and 

(iv) identifying a set of evolutionarily conserved variable amino acid residues, 

74. (withdrawn) The method of claim 1, wherein developing the sequence activity 
model comprises applying principal component regression to the activity and sequence 
information. 

75. (withdrawn) The method of claim 1, wherein developing the sequence activity 
model comprises using a support vector machine with the activity and sequence information. 
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76. (Previously presented) A method for identifying nucleotides for variation in 
nucleic acids encoding a protein variant library m order to affect a desired activity, said method 
Comprising: 

(a) receiving data characterizing a training set of a protein variant library, wherein the 
data provides activity and nucleotide sequence information for each protein variant in the 
training set; 

(b) from the data, developing a sequence activity model that predicts activity as a 
function of nucleotide types and corresponding position in the nucleotide sequence; 

(c) using the sequence activity model to rank positions m a nucleotide sequence 
and/or nucleotide types at specific positions in the nucleotide sequence in order of impact 
on the desired activity; 

(d) using foe ranking to identify one or more nucleotides, in the nucleotide sequence, 
that are to be varied or fixed in order to impact the desired activity. 

77. (Previously presented) The method of claim 76, wherein the nucleotides to be 
varied are codons encoding particular amino acids. 

78. (Previously presented) The method of claim 77, wherein foe activity is a function 
of expression of nucleic acids. 

79. (Previously presented) A computer program product comprising a machine 
readable medium on winch is provided program instructions for identifying nucleotides for 
variation in nucleic acids encoding a protein variant library in order to affect a desired activity, 
said instructions comprising: 

(a) code for receiving data characterizing a training set of a protein variant library, 
wherein the data provides activity and nucleotide sequence information for each protein 
variant in the training set; 

(b) code for developing a sequence activity model from the data, which sequence 
activity model predicts activity as a function of nucleotide types and corresponding 
position in the nucleotide sequence; 

(c) code for using the sequence activity model to rank positions in a nucleotide 
sequence and/or nucleotide types at specific positions in the nucleotide sequence in order 
of impact on the desired activity; 

(d) code for using the ranking to identify one or more nucleotides, in the nucleotide 
sequence, that are to be varied or fixed in order to impact the desired activity. 

80, (Previously presented) The computer program product of claim 79, wherein the 
nucleotides to be varied are codons encoding particular amino acids. 
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81. (Previously presented) The computer program product of claim 79, wherein the 
activity is a function of expression of nucleic acids. 

82. (withdrawn) A method of defining a library of biological molecules, the method 
comprising: 

(a) receiving an original set of data points representing activity and sequence of multiple 
biological molecules in a training set; 

(b) constructing a bootstrap set of data points selected, with replacement, from the 
original set of data points; 

(c) generating a model from the bootstrap set, which model comprises indicators of the 
relative importance of individual residues or other units in biological molecules represented by 
the data points in the bootstrap set; 

(d) repeating (b) and (c) multiple times to generate multiple values of each indicator from 
the model generated in (c); 

(e) for each indicator, detemrining (i) an average or mean value of the multiple values 
and (ii) a statistical indication of the distribution of the multiple values; 

(f) ranking the individual residues or other units on basis of their respective values of (i) 
and (ii) determined in (e); and 

(g) toggling particular ones of the individual residues or other units based on rankings 
produced in (f) to thereby define the library of biological molecules. 

83. (withdrawn) The method of claim 82, wherein the original set of data points is 
generated by systematic variation of a starting sequence. 

84. (withdrawn) The method of claim 82, wherein (b) comprises constructing the 
bootstrap set with two or more occurrences of a data point representing the same biological 
molecule. 

85. (withdrawn) The method of claim 82, wherein (b) comprises constructing the 
bootstrap set with no occuneace of a data point having a particular residue or other unit found in 
at least one of the multiple biological molecules in the training set. 

86. (withdrawn) The method of claim 82, wherein (c) comprises using a regression 
technique to generate the model. 

87. (withdrawn) The method of claim 86, wherein the regression technique is PLS 
orPCR- 
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95. (withdrawn) The computer program product of claim 94, wherein (b) comprises 
code for constructing the bootstrap set with two or more occurrences of a data point representing 
the same biological molecule. 

96. ((withdrawn) The computer program product of claim 94, wherein (b) comprises 
code for constructing the bootstrap set with no occurrence of a data point having a particular 
residue or other unit found in at least one of the multiple biological molecules in the training set 

97. (withdrawn) The computer program product of claim 94, wherein (c) comprises 
code for using a regression technique to generate the model. 

98. (withdrawn) Hie computer program product of claim 97, wherein the regression 
technique is PLS or PGR. 

99. (withdrawn) The computer program product of claim 94, further comprising 
code for generating a p-value for each indicator, wherein the p-value is generated ftom a mean 
and standard deviation of multiple values for each indicator. 

100- (withdrawn) The computer program product of claim 99, wherein (£) comprises 
code for ranking the Individual residues or other units on the basis of the p-values. 
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